Parallel CABAC Decoding Using Entropy Slices

ABSTRACT

A method of video encoding is provided that includes performing context-adaptive binary arithmetic coding (CABAC) on a plurality of syntax element values in a slice to generate a plurality of entropy-encoded syntax element values, generating an entropy slice header to identify the plurality of entropy-encoded syntax element values as an entropy slice, wherein the entropy slice header comprises context model initialization information, and outputting the entropy slice header and the plurality of entropy encoded syntax element values.

CLAIM OF PRIORITY UNDER 35 U.S.C. 119(e)

The present application claims priority to U.S. Provisional Patent Application No. 61/106,323, filed Oct. 17, 2008, entitled “Method and Apparatus for Video Processing in Context-Adaptive Binary Arithmetic Coding,” which is incorporated by reference herein in its entirety.

BACKGROUND OF THE INVENTION

The demand for digital video products continues to increase. Some examples of applications for digital video include video communication, security and surveillance, industrial automation, and entertainment (e.g., DV, HDTV, satellite TV, set-top boxes, Internet video streaming, digital cameras, video jukeboxes, high-end displays and personal video recorders). Further, video applications are becoming increasingly mobile as a result of higher computation power in handsets, advances in battery technology, and high-speed wireless connectivity.

Video compression and decompression is an essential enabler for digital video products. Compression-decompression (CODEC) algorithms enable storage and transmission of digital video. Typically codecs are industry standards such as MPEG-2, MPEG-4, H.264/AVC, etc. At the core of all of these standards is the hybrid video coding technique of block motion compensation (prediction) plus transform coding of prediction error. Block motion compensation is used to remove temporal redundancy between successive pictures (frames or fields) by prediction from prior pictures, whereas transform coding is used to remove spatial redundancy within each block.

Many block motion compensation schemes basically assume that between successive pictures, i.e., frames, in a video sequence, an object in a scene undergoes a displacement in the x- and y-directions and these displacements define the components of a motion vector. Thus, an object in one picture can be predicted from the object in a prior picture by using the motion vector of the object. To track visual differences from frame-to-frame, each frame is tiled into blocks often referred to as macroblocks. Block-based motion estimation algorithms are used to generate a set of vectors to describe block motion flow between frames, thereby constructing a motion-compensated prediction of a frame. The vectors are determined using block-matching procedures that try to identify the most similar blocks in the current frame with those that have already been encoded in prior frames.

Context-adaptive binary arithmetic coding (CABAC) is a form of entropy coding used in video coding such as H.264/MPEG-4 AVC. As such it is an inherently lossless compression technique. It is notable for providing considerably better compression than most other encoding algorithms used in video encoding and is considered one of the primary advantages of the H.264/AVC encoding scheme. CABAC is only supported in Main and higher profiles and requires a considerable amount of processing to decode compared to other similar algorithms. As a result, context-adaptive variable-length coding (CAVLC), a lower efficiency entropy encoding scheme, is sometimes used instead to increase performance on slower playback devices. CABAC achieves 9%-14% better compression compared to CAVLC, with the cost of increased complexity.

The theory and operation of CABAC encoding for H.264 is fully defined in the International Telecommunication Union, Telecommunication Standardization Sector (ITU-T) standard “Advanced video coding for generic audiovisual services” H.264, revision 03/2005 or later. General principles are explained in detail in “Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard,” Detlev Marpe, July 2003. In brief, CABAC has multiple probability models for different contexts, i.e., multiple context models. It first converts all non-binary symbols to binary. Then, for each bit, the CABAC coder selects which probability model to use, then uses information from nearby elements to optimize the probability estimate. Arithmetic coding is then applied to compress the data.

Efficient coding of syntax element values in a hybrid block-based video coder, such as components of motion vector differences or transform-coefficient level values, can be achieved by employing a binarization scheme as a kind of preprocessing unit for the subsequent stages of context modeling and binary arithmetic coding. In general, a binarization scheme defines a unique mapping of syntax element values to sequences of binary decisions, so-called bins, which can also be interpreted in terms of a binary code tree.

By decomposing each syntax element value into a sequence of bins, further processing of each bin value in CABAC depends on the associated coding-mode decision which can be either the regular or the bypass mode. Bypass mode is typically used for bins that are assumed to be uniformly distributed. In the regular coding mode, each bin value is encoded by using the regular binary arithmetic-coding engine, where the associated probability model is either determined by a fixed choice, without any context modeling, or adaptively chosen depending on a related context model. Context models are identified by a context index that is selected from 460 possible values (except High 4:4:4 Intra and High 4:4:4 Predictive profiles). Further, each context model is determined by two parameters, a probability state index (pStateldx) representing the current estimate of the probability of the least probable symbol (LPS) and the binary value of the current most probable symbol (MPS). These two parameters are referred to as context state variables or context variables. Default initial values defined in the H.264 standard are used to initialize the context variables for a context model and the variable values are updated after each bin is encoded.

For bypass mode, complexity of the arithmetic coding is significantly reduced. For regular arithmetic coding, encoding of the given bin value depends on the actual state of the associated adaptive probability model that is passed along with the bin value to the multiplication-free Modulo (M) coder, which is a table-based binary arithmetic coding engine used in CABAC. Probability estimation in CABAC is based on a table-driven estimator in which each probability model can take one of 64 different states with associated probability values p ranging in the interval 0.0-0.5. The distinction between the least probable symbol (LPS) and the most probable symbol (MPS) allows each state to be specified by means of the corresponding LPS-related probability pLPS.

The use of CABAC can create a performance bottleneck for hardware decoders with the high-definition (HD) video requirements. One possible solution the CABAC throughput problem is to structure the output of an video encoder to create data partitions that are independently decodable by a CABAC decoder in a video decoder. These data partitions could then be decoded in parallel using multiple processors. One suggested partitioning approach for next generation video coding, referred to as entropy slices, is described in “Parallel Entropy Decoding for High Resolution Video Coding,” Jie Zhao and Andrew Segall, SPIE Vol. 7257, 725706-1-725706-11, January, 2009 (“Zhao”).

In general, an entropy slice as described in Zhao is similar to the slice concept used in H.264 but it is only applied to entropy coding and decoding. The Zhao entropy slice includes a sequence of entropy encoded macroblocks. Syntax, i.e., an entropy header, is included in the output bit stream of the entropy encoder to identify the start of each entropy slice. Further, each entropy slice is defined such that it can be decoded by a CABAC entropy decoder independent of other entropy slices. Other specific features of the Zhao entropy slice are that in the CABAC entropy decoder, all context models are reinitialized to their initial default states at the beginning of each entropy slice. Further, during decoding, the context state is only updated within an entropy slice, and context model updates are not made across entropy slice boundaries. In addition, macroblocks in other entropy slices are marked as unavailable for the purpose of entropy decoding.

BRIEF DESCRIPTION OF THE DRAWINGS

Particular embodiments in accordance with the invention will now be described, by way of example only, and with reference to the accompanying drawings:

FIG. 1 shows a block diagram of a video encoding/decoding system in accordance with one or more embodiments of the invention;

FIG. 2 shows a block diagram of a video encoder in accordance with one or more embodiments of the invention;

FIG. 3 shows a block diagram of a video decoder in accordance with one or more embodiments of the invention;

FIG. 4 shows a flow diagram of a method of CABAC encoding in accordance with one or more embodiments of the invention;

FIGS. 5 and 6 show flow diagrams of methods of parallel CABAC decoding in accordance with one or more embodiments of the invention; and

FIGS. 7-9 show block diagrams of illustrative digital systems in accordance with one or more embodiments of the invention.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. Like elements in the various figures are denoted by like reference numerals for consistency.

Certain terms are used throughout the following description and the claims to refer to particular system components. As one skilled in the art will appreciate, components in digital systems may be referred to by different names and/or may be combined in ways not shown herein without departing from the described functionality. This document does not intend to distinguish between components that differ in name but not function. In the following discussion and in the claims, the terms “including” and “comprising” are used in an open-ended fashion, and thus should be interpreted to mean “including, but not limited to . . . .” Also, the term “couple” and derivatives thereof are intended to mean an indirect, direct, optical, and/or wireless electrical connection. Thus, if a first device couples to a second device, that connection may be through a direct electrical connection, through an indirect electrical connection via other devices and connections, through an optical electrical connection, and/or through a wireless electrical connection.

In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description. In addition, although method steps may be presented and described herein in a sequential fashion, one or more of the steps shown and described may be omitted, repeated, performed concurrently, and/or performed in a different order than the order shown in the figures and/or described herein. Accordingly, embodiments of the invention should not be considered limited to the specific ordering of steps shown in the figures and/or described herein. Further, while various embodiments of the invention are described herein in accordance with the H.264 video coding standard, embodiments for other video coding standards will be understood by one of ordinary skill in the art. Accordingly, embodiments of the invention should not be considered limited to the H.264 video coding standard.

In the description herein, some terminology is used that is specifically defined in the H.264 video coding standard and/or is well understood by those of ordinary skill in the art in CABAC coding. Definitions of these terms are not provided in the interest of brevity. Further, this terminology is used for convenience of explanation and should not be considered as limiting embodiments of the invention to the H.264 standard. One of ordinary skill in the art will appreciate that different terminology may be used in other video encoding standards without departing from the described functionality.

As previously mentioned, the use of entropy slices has been suggested for enabling parallel entropy decoding in video decoders. However, some studies have shown that using the entropy slices as described in Zhao may cause a bit rate increase in video encoding under some circumstances that diminishes the desirability of using CABAC over other, less complex variable length coding (VLC) techniques. There are two primary reasons for the observed bit rate increase. The first reason is the requirement that the context model probability states used in CABAC are reset to their initial, default states at the beginning of each entropy slice. As the entropy slice size is decreased, the resets of the context models occur more frequently. Frequent resets of the context models to their default states reduce probability model accuracy and impede compression efficiency. The second reason is that the selection of context models for some syntax elements in CABAC, such as motion vector difference, is improved by using information from neighboring macroblocks. The top neighbors of blocks in the upper row of slices are not available. As the size of an entropy slice size is reduced, the percentage of blocks in the top row will increase, thus reducing the accuracy of the context models that rely on information from neighboring macroblocks.

Embodiments of the invention provide CABAC encoding and decoding based on entropy slices that may reduce the observed bit rate increases attributed to the above two reasons. More specifically, in some embodiments of the invention, additional information for context model initialization is included in the entropy header of an entropy slice by the CABAC entropy encoder. As is described in more detail below, a CABAC entropy decoder can use this additional information to initialize the state of selected context models rather than resetting the context models to their default initial states. Further, in some embodiments of the invention, the parallel decoding of entropy slices is structured such that information from previously decoded entropy slices may used to estimate the initial context states for context models in subsequent entropy slices.

FIG. 1 shows a block diagram of a video encoding/decoding system in accordance with one or more embodiments of the invention. The video encoding/decoding system performs encoding and decoding of digital video sequences using methods for CABAC entropy encoding and decoding as described herein. The system includes a source digital system (100) that transmits encoded video sequences to a destination digital system (102) via a communication channel (116). The source digital system (100) includes a video capture component (104), a video encoder component (106), and a transmitter component (108). The video capture component (104) is configured to provide a video sequence to be encoded by the video encoder component (106). The video capture component (104) may be for example, a video camera, a video archive, or a video feed from a video content provider. The video capture component (104) may generate computer graphics as the video sequence, or a combination of live video and computer-generated video.

The video encoder component (106) receives a video sequence from the video capture component (104) and encodes it for transmission by the transmitter component (108). In general, the video encoder component (106) receives the video sequence from the video capture component (104) as a sequence of video frames, divides the frames into coding units which may be a whole frame or a slice of a frame, divides the coding units into blocks of pixels, and encodes the video data in the coding units based on these blocks. The video encoder (106) includes functionality to perform one or more embodiments of methods for CABAC entropy encoding as described herein.

The transmitter component (108) transmits the encoded video data to the destination digital system (102) via the communication channel (116). The communication channel (116) may be any communication medium, or combination of communication media suitable for transmission of the encoded video sequence, such as, for example, wired or wireless communication media, a local area network, or a wide area network. The video capture and encoding may take place at a different location and time than the transmission. For example, television programs and movies may be produced, encoded and stored on a disc or other storage devices. The stored movie or program may then be transmitted at a later time.

The destination digital system (102) includes a receiver component (110), a video decoder component (112) and a display component (114). The receiver component (110) receives the encoded video data from the source digital system (100) via the communication channel (116) and provides the encoded video data to the video decoder component (112) for decoding. In general, the video decoder component (112) reverses the encoding process performed by the video encoder component (106) to reconstruct the frames of the video sequence. The video decoder component (112) includes functionality to perform one or more embodiments of methods for CABAC entropy decoding as described herein. The reconstructed video sequence may then be displayed on the display component (114). The display component (114) may be any suitable display device such as, for example, a plasma display, a liquid crystal display (LCD), a light emitting diode (LED) display, etc.

In some embodiments of the invention, the source digital system (100) may also include a receiver component and a video decoder component and/or the destination digital system (102) may include a transmitter component and a video encoder component for transmission of video sequences both directions for video steaming, video broadcasting, and video telephony. Further, the video encoder component (106) and the video decoder component (112) may perform encoding and decoding in accordance with a video compression standard such as, for example, the Moving Picture Experts Group (MPEG) video compression standards, e.g., MPEG-1, MPEG-2, and MPEG-4, the ITU-T video compressions standards, e.g., H.263 and H.264, the Society of Motion Picture and Television Engineers (SMPTE) 421 M video CODEC standard (commonly referred to as “VC-1”), the video compression standard defined by the Audio Video Coding Standard Workgroup of China (commonly referred to as “AVS”), etc. The video encoder component (106) and the video decoder component (112) may be implemented in any suitable combination of software, firmware, and hardware, such as, for example, one or more digital signal processors (DSPs), microprocessors, discrete logic, application specific integrated circuits (ASICs), etc.

FIG. 2 shows a block diagram of a video encoder, e.g., the video encoder (106), in accordance with one or more embodiments of the invention. In the video encoder of FIG. 2, input frames (200) for encoding are provided as one input of a motion estimation component (220), as one input of an intraframe prediction component (224), and to a positive input of a combiner (202) (e.g., adder or subtractor or the like). The frame storage component (218) provides reference data to the motion estimation component (220) and to the motion compensation component (222). The reference data may include one or more previously encoded and decoded frames. The motion estimation component (220) provides motion estimation information to the motion compensation component (222) and the entropy encoders (234). More specifically, the motion estimation component (220) performs tests based on the prediction modes to choose the best motion vector(s)/prediction mode. The motion estimation component (220) provides the selected motion vector (MV) or vectors and the selected prediction mode to the motion compensation component (222) and the selected motion vector (MV) to the entropy encoder component (234).

The motion compensation component (222) provides motion compensated prediction information to a selector switch (226) that includes motion compensated interframe prediction macroblocks (MBs). The intraframe prediction component also provides intraframe prediction information to switch (226) that includes intraframe prediction MBs and a prediction mode. That is, similar to the motion estimation component (220), the intraframe prediction component performs tests based on prediction modes to choose the best prediction mode for generating the intraframe prediction MBs.

The switch (226) selects between the motion-compensated interframe prediction MBs from the motion compensation component (222) and the intraframe prediction MBs from the intraprediction component (224) based on the selected prediction mode. The output of the switch (226) (i.e., the selected prediction MB) is provided to a negative input of the combiner (202) and to a delay component (230). The output of the delay component (230) is provided to another combiner (i.e., an adder) (238). The combiner (202) subtracts the selected prediction MB from the current MB of the current input frame to provide a residual MB to the transform component (204). The resulting residual MB is a set of pixel difference values that quantify differences between pixel values of the original MB and the prediction MB. The transform component (204) performs a block transform such as DCT, on the residual MB to convert the residual pixel values to transform coefficients and outputs the transform coefficients.

The transform coefficients are provided to a quantization component (206) which outputs quantized transform coefficients. Because the DCT transform redistributes the energy of the residual signal into the frequency domain, the quantized transform coefficients are taken out of their raster-scan ordering and arranged by significance, generally beginning with the more significant coefficients followed by the less significant by a scan component (208). The ordered quantized transform coefficients provided via a scan component (208) to the entropy encoder component (234).

The entropy encoder component (234) performs entropy encoding on encoded macroblocks to generate a compressed bit stream (236) for transmission or storage. The entropy encoder component (234) may include functionality to perform one or more of any suitable entropy encoding techniques, such as, for example, context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), run length coding, etc. In one or more embodiments of the invention, the entropy encoder component (234) includes functionality to perform one or embodiments of methods for CABAC entropy encoding as described herein.

Inside every encoder is an embedded decoder. As any compliant decoder is expected to reconstruct an image from a compressed bit stream, the embedded decoder provides the same utility to the video encoder. Knowledge of the reconstructed input allows the video encoder to transmit the appropriate residual energy to compose subsequent frames. To determine the reconstructed input, the ordered quantized transform coefficients provided via the scan component (208) are returned to their original post-DCT arrangement by an inverse scan component (210), the output of which is provided to a dequantize component (212), which outputs estimated transformed information, i.e., an estimated or reconstructed version of the transform result from the transform component (204). The estimated transformed information is provided to the inverse transform component (214), which outputs estimated residual information which represents a reconstructed version of the residual MB. The reconstructed residual MB is provided to the combiner (238). The combiner (238) adds the delayed selected predicted MB to the reconstructed residual MB to generate an unfiltered reconstructed MB, which becomes part of reconstructed frame information. The reconstructed frame information is provided via a buffer (228) to the intraframe prediction component (224) and to a filter component (216). The filter component (216) is a deblocking filter which filters the reconstructed frame information and provides filtered reconstructed frames to frame storage component (218).

FIG. 3 shows a block diagram of a video decoder, e.g., the video decoder (112), in accordance with one or more embodiments of the invention. In the video decoder of FIG. 3, the entropy decoding component 300 receives an entropy encoded video bit stream and reverses the entropy encoding to recover the encoded macroblocks. The entropy decoding performed by the entropy decoder component (300) may include functionality to perform one or more of any suitable entropy decoding techniques, such as, for example, context adaptive variable length decoding (CAVLC), context adaptive binary arithmetic decoding (CABAC), run length decoding, etc. In one or more embodiments of the invention, the entropy decoder component (300) includes functionality to perform one or embodiments of methods for CABAC entropy decoding as described herein.

The inverse scan and dequantization component (302) assembles the macroblocks in the video bit stream in raster scan order and substantially recovers the original frequency domain data. The inverse transform component (304) transforms the frequency domain data from inverse scan and dequantization component (302) back to the spatial domain. This spatial domain data supplies one input of the addition component (306). The other input of addition component (306) comes from the macroblock mode switch (308). When inter-prediction mode is signaled in the encoded video stream, the macroblock mode switch (308) selects the output of the motion compensation component (310). The motion compensation component (310) receives reference frames from frame storage (312) and applies the motion compensation computed by the encoder and transmitted in the encoded video bit stream. When intra-prediction mode is signaled in the encoded video stream, the macroblock mode switch (308) selects the output of the intra-prediction component (314). The intra-prediction component (314) applies the intra-prediction computed by the encoder and transmitted in the encoded video bit stream.

The addition component (306) recovers the predicted frame. The output of addition component (306) supplies the input of the deblocking filter component (316). The deblocking filter component (316) smoothes artifacts created by the block and macroblock nature of the encoding process to improve the visual quality of the decoded frame. In one or more embodiments of the invention, the deblocking filter component (316) applies a macroblock-based loop filter for regular decoding to maximize performance and applies a frame-based loop filter for frames encoded using flexible macroblock ordering (FMO) and for frames encoded using arbitrary slice order (ASO). The macroblock-based loop filter is performed after each macroblock is decoded, while the frame-based loop filter delays filtering until all macroblocks in the frame have been decoded.

More specifically, because a deblocking filter processes pixels across macroblock boundaries, the neighboring macroblocks are decoded before the filtering is applied. In some embodiments of the invention, performing the loop filter as each macroblock is decoded has the advantage of processing the pixels while they are in on-chip memory, rather than writing out pixels and reading them back in later, which consumes more power and adds delay. However, if macroblocks are decoded out of order, as with FMO or ASO, the pixels from neighboring macroblocks may not be available when the macroblock is decoded; in this case, macroblock-based loop filtering cannot be performed. For FMO or ASO, the loop filtering is delayed until after all macroblocks are decoded for the frame, and the pixels must be reread in a second pass to perform frame-based loop filtering. The output of the deblocking filter component (316) is the decoded frames of the video bit stream. Each decoded frame is stored in frame storage (312) to be used as a reference frame.

FIG. 4 shows a flow diagram of a method of CABAC entropy encoding in accordance with one or more embodiments of the invention. More specifically, the method illustrates the generation of entropy slices as a part of CABAC entropy encoding. In general, slices of video data are partitioned into one or more entropy slices during CABAC entropy encoding of the syntax element values of the slices. As is well known in the art, a slice may be a subset of macroblocks in a picture (frame) or may the entire picture. As shown in FIG. 4, a syntax element value of a slice is entropy encoded using CABAC (400) to generate a bin string representing the syntax element. Consecutive syntax element values are encoded until either sufficient syntax elements have been encoded to fulfill the size criteria for an entropy slice or the last syntax element value in a slice is encoded (402). The size of an entropy slice may be set, for example, as some number of macroblocks or a maximum number of bits.

When sufficient syntax element values have been encoded (or the last syntax element value in a slice has been encoded), an entropy slice header is generated for the entropy slice (404). The entropy slice header includes information identifying the header as that of an entropy slice and context model initialization information to be used by a CABAC entropy decoder to initialize context models prior to decoding the entropy slice. The context model initialization information may be, for example, initial values for context variables or parameters that can be used to calculate the initial values for context variables for selected context models. The generation of the context model initialization information is described in more detail below. The entropy slice header and the entropy encoded syntax element values are then output in a bit stream (406). The process is repeated until all syntax element values in the slice are entropy encoded and included in an entropy slice (408).

In one or more embodiments of the invention, the context model initialization information included in the entropy slice header is initialization information for a subset of the context models. The size of the subset may be set to be very small compared to the entire context space and may be chosen as a tradeoff between providing more initialization information to improve the throughput of the entropy decoder and potential bit rate increase due to including initialization information for more context models. The selection of which context models to include in the subset may be based on which of the context models is most frequently used. In other words, context initialization information for the more frequently used context models will be included in the subset in preference to context initialization information for less frequently used context models. For example, coefficient level syntax element bin strings account for a large portion of video bit streams. Therefore, context models corresponding to coefficient levels will be used most frequently in decoding of entropy slices and the corresponding context models may be given preference for inclusion in the subset of context models for which context initialization information is provided in an entropy slice header.

In some embodiments of the invention, the context models included in the subset are statically defined. More specifically, when an entropy slice header is generated, the context model initialization information for a fixed, predefined subset of context models is included in the header. The fixed, predefined subset of context models may vary by slice type (I-slice, B-slice, P-slice). For example, the subset of context models for a B-slice or a P-slice may include context models related to motion vectors while the subset for an I-slice may not. The optimal context subsets may be determined empirically by statistical analysis of video sequences at various resolutions, bit rates, and configurations.

In some embodiments of the invention, the context models included in the subset are dynamically selected. For example, the entropy encoder may keep track of how frequently each of the possible context models is used while encoding the syntax element values for an entropy slice. The most frequently used context models (based on a threshold) may then be selected for inclusion in the subset of context models for which context initialization information is provided in the entropy slice header. In another example, context models in which the context variables values have deviated significantly (based on a threshold) from their initial default values during the encoding of the syntax element values in the entropy slice may be selected for inclusion in the subset. The maximum size of the subset may be fixed even though the actual context models to be included in the subset may change dynamically. When the context models in the subset are dynamically selected, the context indices for each of the selected context models is included in the entropy slice header as well as the context model initialization information.

In one or more embodiments of the invention, the context models included in the subset may be both statically and dynamically defined. For example, a few context models may be always included in the subset while others may be dynamically selected.

In one or more embodiments of the invention, the context initialization information included in the entropy slice header may be compressed. For example, the context variable values for a context model may be defined with a fixed number of bits, e.g., seven bits as in H.264. For each context model included in the subset, the difference between the fixed bit value of the default initial context variable values defined for the context model and the fixed bit value of the context variable values after encoding of syntax element values in an entropy slice may be computed and only the difference included in the entropy slice header. In some embodiments of the invention, the difference values are also quantized and compressed using any appropriate known simple entropy encoding technique. As was previously described, if the context models included in the subset are dynamically selected, the context indices for the selected context models are also included in the entropy slice header. The simple entropy encoding technique may also be used to compress the context indices.

FIG. 5 shows a flow diagram of a method of parallel CABAC entropy decoding in accordance with one or more embodiments of the invention. More specifically, the method illustrates the parallel decoding of entropy slices in a slice as a part of CABAC entropy decoding. In this method, the entropy slice headers include context model initialization information. The method as depicted in FIG. 5 assumes that some number N of entropy slices may be decoded in parallel. The value of the number N depends on, among other things, the processing capabilities of a digital system, i.e., the number of processing units available for parallel processing, on which the method is implemented.

As shown in FIG. 5, a bit stream of encoded video data is parsed to identify entropy slice headers in a slice (500). In one or more embodiments of the invention, as each entropy slice header is identified, the decoding of the entropy slice is delegated to one of the N processing units. For example, the delegation may be to an idle processing unit and/or the delegation may take the form of enqueuing the entropy slice for decoding on the next available processing unit. Accordingly, an initial entropy slice in the bit stream can be delegated to a processing unit for decoding and the decoding initiated before the slice header of the next entropy slice is identified and the associated entropy slice is delegated to another processing unit for decoding.

Each identified entropy slice is then entropy decoded. For each of the entropy slices, context models are initialized using the context model initialization information in the respective entropy slice header (502 a, 502 b, 502 c). More specifically, the context model initialization information in the entropy slice header is parsed and used to initialize corresponding context models.

In some embodiments of the invention, the context model initialization information in the entropy slice header is for a statically defined subset of the context models. In some embodiments of the invention, the context model initialization information in the entropy slice header is for a subset of the context models dynamically selected by the entropy encoder. In such embodiments, the respective entropy slice headers are parsed to determine the context indices of the context models for which context model initialization information is provided. Further, in one or more embodiments of the invention, the context model initialization information and context indices, if present, are compressed as previously described. In such embodiments, the context model initialization information and context indices, if present, are decompressed.

In one or more embodiments of the invention, context models that were not included in the subset of context models for which context model initialization information is provided in the entropy slice header are reset to default initial states. In some embodiments of the invention, estimates of context model initialization information of such context models are generated based on the context model initialization information included in the entropy slice header. The estimates may be generated, for example, based on identified correlations between the evolution of probability models in different contexts. Such correlations may be characterized empirically by analysis of CABAC encoding of video sequences at various resolutions, bit rates, and configurations.

After initialization of the context models, the entropy encoded syntax element values in the respective entropy slices are decoded (504 a, 504 b, 504 c) and the decoded syntax element values are output (506 a, 506 b, 506 c). The process is repeated until all entropy slices in the slice are decoded (508).

FIG. 6 shows a flow diagram of a method of parallel CABAC entropy decoding in accordance with one or more embodiments of the invention. More specifically, the method illustrates the parallel decoding of entropy slices in a slice as a part of CABAC entropy decoding. In some embodiments of the invention, the entropy slice headers do not include context model initialization information as previously described. In other embodiments of the invention, the entropy slice headers do include context model initialization information. The method assumes a limitation M on the maximum number of entropy slices that may be entropy decoded in parallel. This limitation may be set by the video encoder that generated the bit stream to be decoded and communicated in the bit stream.

Limiting the maximum number of entropy slices decoded in parallel to M ensures that entropy slices 1 . . . x and an entropy slice x+M are never decoded in parallel. In such a case, an entropy slice x+M can be allowed to have some dependency on one or more of the entropy slices 1 . . . x. For example, the context models for entropy slice x+M can be initialized with estimated values based on the final states of the context models from one or more, i.e., a subset, of entropy slices 1 . . . x. The subset of the M entropy slices on which the context initialization of an entropy slice may depend is referred to as the reference entropy slice subset. In one or more embodiments of the invention, in addition to communicating the maximum number M of entropy slices that may be entropy decoded in parallel, the video encoder also identifies in the entropy slice header of an entropy slice x+M which of the 1 . . . x preceding entropy slices are included in the reference entropy slice subset.

The method as depicted in FIG. 6 assumes that N entropy slices may be decoded in parallel. The value of the number N depends on, among other things, the processing capabilities of a digital system, i.e., the number of processing units available for parallel processing, on which the method is implemented. In addition, the value of N is less than or equal to M. That is, even if the digital system has more than M processing units, only M entropy slices are permitted to decoded in parallel.

As shown in FIG. 6, a bit stream of encoded video data is parsed to identify entropy slice headers in a slice (600). In one or more embodiments of the invention, as each entropy slice header is identified, the decoding of the entropy slice is delegated to one of the N processing units. For example, the delegation may be to an idle processing unit and/or the delegation may take the form of enqueuing the entropy slice for decoding on the next available processing unit. The delegation may also be scheduled to ensure that the decoding of an entropy slice is not started until the entropy slices in the reference entropy slice subset of the entropy slice are decoded.

Each identified entropy slice is then entropy decoded. To entropy decode an entropy slice, the context models are initialized (602 a, 602 b, 602 c). If the entropy slice is not one of the first M entropy slices in the bit stream, the context models are initialized using context model initialization information estimated from the final states of the context models of the entropy slices in the reference entropy slice subset of the entropy slice. The context model initialization information may be estimated, for example, by using the weighted combination of the probability models resulting from decoding the entropy slices in the reference entropy slice subset. Combination weights can be calculated, for example, using the spatial pixel distance in a picture between an entropy slice and the reference entropy slices. If the entropy slice is one of the first M entropy slices, the context models are initialized in some other manner. In some embodiments of the invention, the context models for an entropy slice in the first M entropy slices are initialized using default initialization values. In some embodiments of the invention, the context models for an entropy slice in the first M entropy slices are initialized using context model initialization information from the respective entropy slice headers as previously described.

After initialization of the context models, the entropy encoded syntax element values in the entropy slice is decoded (604 a, 604 b, 604 c) and the decoded syntax element values are output (606 a, 606 b, 606 c). The process is repeated until all entropy slices in the slice are decoded (608).

FIG. 7 shows a digital system (700) (e.g., a personal computer) that includes a processor (702), associated memory (704), a storage device (706), and numerous other elements and functionalities typical of digital systems (not shown). In one or more embodiments of the invention, a digital system may include multiple processors and/or one or more of the processors may be digital signal processors. The digital system (700) may also include input means, such as a keyboard (708) and a mouse (710) (or other cursor control device), and output means, such as a monitor (712) (or other display device). The digital system (700) may also include an image capture device (not shown) that includes circuitry (e.g., optics, a sensor, readout electronics) for capturing video sequences. The digital system (700) may be connected to a network (714) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, a cellular network, any other similar type of network and/or any combination thereof) via a network interface connection 720. Video image data may be received via the network. Those skilled in the art will appreciate that these input and output means may take other forms.

The digital system (700) may include a video encoder and/or video decoder with functionality to perform, respectively, embodiments of methods for CABAC encoding and parallel CABAC decoding as described herein. The video decoder may be configured to decode video image data received over a network or from storage media coupled to storage module 706. The digital system (700) may be further configured to display the decoded video data stream, such as a movie or other type of video images, on monitor 712.

Further, those skilled in the art will appreciate that one or more elements of the aforementioned digital system (700) may be located at a remote location and connected to the other elements over a network. Further, embodiments of the invention may be implemented on a distributed system having a plurality of nodes, where each portion of the system and software instructions may be located on a different node within the distributed system. In one embodiment of the invention, the node may be a digital system. Alternatively, the node may be a processor with associated physical memory. The node may alternatively be a processor with shared memory and/or resources.

Software instructions to perform embodiments of the invention may be stored on a computer readable medium such as a compact disc (CD), a diskette, a tape, a file, or any other computer readable storage device. The software instructions may be distributed to the digital system (700) via removable memory (e.g., floppy disk, optical disk, flash memory, USB key), via a transmission path, etc.

FIG. 8 is a block diagram of a digital system (e.g., a mobile cellular telephone) (800) that may be configured to perform the methods described herein. The signal processing unit (SPU) (802) includes a digital processing processor system (DSP) that includes embedded memory and security features. The analog baseband unit (804) receives a voice data stream from handset microphone (813 a) and sends a voice data stream to the handset mono speaker (813 b). The analog baseband unit (804) also receives a voice data stream from the microphone (814 a) and sends a voice data stream to the mono headset (814 b). The analog baseband unit (804) and the SPU (802) may be separate ICs. In many embodiments, the analog baseband unit (804) does not embed a programmable processor core, but performs processing based on configuration of audio paths, filters, gains, etc being setup by software running on the SPU (802). In some embodiments, the analog baseband processing is performed on the same processor and can send information to it for interaction with a user of the digital system (800) during a call processing or other processing.

The display (820) may also display pictures and video streams received from the network, from a local camera (828), or from other sources such as the USB (826) or the memory (812). The SPU (802) may also send a video stream to the display (820) that is received from various sources such as the cellular network via the RF transceiver (806) or the camera (828). The SPU (802) may also send a video stream to an external video display unit via the encoder (822) over a composite output terminal (824). The encoder unit (822) may provide encoding according to PAL/SECAM/NTSC video standards.

The SPU (802) includes functionality to perform the computational operations required for video compression and decompression. The video compression standards supported may include, for example, one or more of the JPEG standards, the MPEG standards, and the H.26x standards. In one or more embodiments of the invention, the SPU (802) is configured to perform the computational operations of one or more of the methods described herein. Software instructions implementing aspects of the methods may be stored in the memory (812) and executed by the SPU (802) during encoding and decoding of video sequences.

FIG. 9 shows a digital system suitable for an embedded system (e.g., a digital camera) in accordance with one or more embodiments of the invention that includes, among other components, a DSP-based image coprocessor (ICP) (902), a RISC processor (904), and a video processing engine (VPE) (906) that may be configured to perform methods for digital image data compression and decompression described herein. The RISC processor (904) may be any suitably configured RISC processor. The VPE (906) includes a configurable video processing front-end (Video FE) (908) input interface used for video capture from imaging peripherals such as image sensors, video decoders, etc., a configurable video processing back-end (Video BE) (910) output interface used for display devices such as SDTV displays, digital LCD panels, HDTV video encoders, etc, and memory interface (924) shared by the Video FE (908) and the Video BE (910). The digital system also includes peripheral interfaces (912) for various peripherals that may include a multi-media card, an audio serial port, a Universal Serial Bus (USB) controller, a serial port interface, etc.

The Video FE (908) includes an image signal processor (ISP) (916), and a 3A statistic generator (3A) (918). The ISP (916) provides an interface to image sensors and digital video sources. More specifically, the ISP (916) may accept raw image/video data from a sensor (CMOS or CCD) and can accept YUV video data in numerous formats. The ISP (916) also includes a parameterized image processing module with functionality to generate image data in a color format (e.g., RGB) from raw CCD/CMOS data. The ISP (916) is customizable for each sensor type and supports video frame rates for preview displays of captured digital images and for video recording modes. The ISP (916) also includes, among other functionality, an image resizer, statistics collection functionality, and a boundary signal calculator. The 3A module (918) includes functionality to support control loops for auto focus, auto white balance, and auto exposure by collecting metrics on the raw image data from the ISP (916) or external memory.

The Video BE (910) includes an on-screen display engine (OSD) (920) and a video analog encoder (VAC) (922). The OSD engine (920) includes functionality to manage display data in various formats for several different types of hardware display windows and it also handles gathering and blending of video data and display/bitmap data into a single display window before providing the data to the VAC (922) in YCbCr format. The VAC (922) includes functionality to take the display frame from the OSD engine (920) and format it into the desired output format and output signals required to interface to display devices. The VAC (922) may interface to composite NTSC/PAL video devices, S-Video devices, digital LCD devices, high-definition video encoders, DVI/HDMI devices, etc.

The memory interface (924) functions as the primary source and sink to modules in the Video FE (908) and the Video BE (910) that are requesting and/or transferring data to/from external memory. The memory interface (924) includes read and write buffers and arbitration logic.

The ICP (902) includes functionality to perform the computational operations required for video encoding and decoding and other processing of captured images. The video encoding standards supported may include one or more of the JPEG standards, the MPEG standards, and the H.26x standards. In one or more embodiments of the invention, the ICP (902) is configured to perform the computational operations of embodiments of the methods described herein.

In operation, to capture an image or video sequence, video signals are received by the video FE (908) and converted to the input format needed to perform video encoding. The video data generated by the video FE (908) is stored in then stored in external memory. The video data is then encoded by a video encoder. As the video data is encoded, a method for CABAC entropy encoding as described herein is applied. The resulting encoded video data is stored in the external memory. The encoded video data may then read from the external memory, decoded by a video decoder that implements a method for parallel CABAC entropy decoding as described herein, and post-processed by the video BE (910) to display the image/video sequence.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.

Embodiments of the decoders and methods described herein may be provided on any of several types of digital systems: digital signal processors (DSPs), general purpose programmable processors, application specific circuits, or systems on a chip (SoC) such as combinations of a DSP and a reduced instruction set (RISC) processor together with various specialized accelerators. A stored program in an onboard or external (flash EEP) ROM or FRAM may be used to implement aspects of the video signal processing. Analog-to-digital converters and digital-to-analog converters provide coupling to the real world, modulators and demodulators (plus antennas for air interfaces) can provide coupling for waveform reception of video data being broadcast over the air by satellite, TV stations, cellular networks, etc or via wired networks such as the Internet.

The techniques described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the software may be executed in one or more processors, such as a microprocessor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), or digital signal processor (DSP). The software that executes the techniques may be initially stored in a computer-readable medium such as compact disc (CD), a diskette, a tape, a file, memory, or any other computer readable storage device and loaded and executed in the processor. In some cases, the software may also be sold in a computer program product, which includes the computer-readable medium and packaging materials for the computer-readable medium. In some cases, the software instructions may be distributed via removable computer readable media (e.g., floppy disk, optical disk, flash memory, USB key), via a transmission path from computer readable media on another digital system, etc.

Embodiments of the methods and video decoders for performing parallel bin decoding as described herein may be implemented for virtually any type of digital system (e.g., a desk top computer, a laptop computer, a set-top box for satellite or cable, a handheld device such as a mobile (i.e., cellular) phone, a personal digital assistant, a digital camera, etc.) with functionality to decode digital video images.

It is therefore contemplated that the appended claims will cover any such modifications of the embodiments as fall within the true scope and spirit of the invention. 

1. A method of video encoding comprising: performing context-adaptive binary arithmetic coding (CABAC) on a plurality of syntax element values in a slice to generate a plurality of entropy-encoded syntax element values; generating an entropy slice header to identify the plurality of entropy-encoded syntax element values as an entropy slice, wherein the entropy slice header comprises context model initialization information; and outputting the entropy slice header and the plurality of entropy encoded syntax element values.
 2. The method of claim 1, wherein the context model initialization information comprises values for setting an initial state of a selected context model.
 3. The method of claim 1, wherein the context model initialization information comprises values for setting an initial state of each of a subset of context models selected from a plurality of context models.
 4. The method of claim 3, wherein at least one context model included in the subset is empirically selected.
 5. The method of claim 3, wherein at least one context model in the subset is dynamically selected based on use of the context model during the CABAC coding of the plurality of syntax element values.
 6. The method of claim 5, wherein a context index of the at least one context model in the subset is comprised in the entropy slice header.
 7. The method of claim 3, wherein at least one context model in the subset is selected based on a type of the slice.
 8. The method of claim 3, wherein generating an entropy slice header comprises compressing the values.
 9. A method of video decoding comprising: identifying a plurality of entropy slice headers in a slice; and decoding entropy slices corresponding to the plurality of entropy slice headers in parallel, wherein each entropy slice comprises a plurality of entropy-encoded syntax element values encoded using context-adaptive binary arithmetic coding (CABAC), and wherein decoding each entropy slice comprises: initializing a context model of a plurality of context models using context model initialization information comprised in the entropy slice header corresponding to the entropy slice; decoding the plurality of entropy-encoded syntax element values comprised in the entropy slice to generate a plurality of syntax element values, wherein the initialized context model is used; and outputting the plurality of syntax element values.
 10. The method of claim 9, wherein the context model initialization information comprises values for setting an initial state of each of a subset of context models selected from the plurality of context models.
 11. The method of claim 10, wherein at least one context model included in the subset was empirically selected.
 12. The method of claim 10, wherein at least one context model in the subset was dynamically selected based on use of the context model during the CABAC encoding of the plurality of syntax element values.
 13. The method of claim 12, wherein a context index of the at least one context model in the subset is comprised in the entropy slice header in the entropy slice header corresponding to the entropy slice.
 14. The method of claim 10, wherein at least one context model in the subset was selected based on a type of the slice.
 15. The method of claim 9, further comprising decompressing the context model initialization information.
 16. A method for video decoding comprising: identifying a plurality of entropy slice headers in a slice; and decoding entropy slices corresponding to the plurality of entropy slice headers in parallel, wherein each entropy slice comprises a plurality of entropy-encoded syntax element values encoded using context-adaptive binary arithmetic coding (CABAC), and wherein decoding each entropy slice comprises: estimating context model initialization information for a context model of a plurality of context models based on decoding of a reference entropy slice subset; initializing the context model using the estimated context model initialization information; decoding the plurality of entropy-encoded syntax element values comprised in the entropy slice to generate a plurality of syntax element values, wherein the initialized context model is used; and outputting the plurality of syntax element values.
 17. The method of claim 16, wherein the reference entropy slice subset is identified in the entropy slice header corresponding to the entropy slice.
 18. The method of claim 16, wherein the reference entropy slice subset comprises at least one of M sequential entropy slices decoded prior to decoding the entropy slice, wherein M is a maximum number of entropy slices that can be decoded in parallel.
 19. The method of claim 18, wherein the value of M is set by a video encoder that generates the plurality of entropy slices. 